A Logistic Regression Model of Determiner Omission in PPs

نویسندگان

  • Tibor Kiss
  • Katja Keßelmeier
  • Antje Müller
  • Claudia Roch
  • Tobias Stadtfeld
  • Jan Strunk
چکیده

The realization of singular count nouns without an accompanying determiner inside a PP (determinerless PP, bare PP, Preposition-Noun Combination) has recently attracted some interest in computational linguistics. Yet, the relevant factors for determiner omission remain unclear, and conditions for determiner omission vary from language to language. We present a logistic regression model of determiner omission in German based on data obtained by applying annotation mining to a large, automatically and manually annotated corpus. 1 The problem and how to deal with it Preposition-Noun Combinations (PNCs, sometimes called determinerless PPs or bare PPs) minimally consist of a preposition and a count noun in the singular that – despite requirements formulated elsewhere in the grammar of the respective language – appears without a determiner. The noun in a PNC can be extended through prenominal modification (1) and postnominal complementation (2). Still, a determiner is missing. The following examples are given from German. (1) auf parlamentarische Anfrage (‘after being asked in parliament’), mit beladenem Rucksack (‘with loaded backpack’), unter sanfter Androhung (‘under gentle threat’) (2) Er wehrt sich gegen die Forderung nach he defies REFL against the demand for Stilllegung einer Verbrennungsanlage. closedown an incineration plant ‘He defies the demand for closing an incineration plant.’ PNCs occur in a wide range of languages (Himmelmann, 1998); the conditions for determiner omission, however, have not been detected yet, and conditions applying to one language do not carry over to other languages. In addition, speakers only reluctantly judge the acceptability of newly coined PNCs, so that reliance to introspective judgments cannot be assumed. For English, Stvan (1998) and Baldwin et al. (2006) have claimed that either the semantics of the preposition or of the noun play a major role in determining whether a singular count noun may appear without a determiner in a PNC. Stvan (1998) assumes that nouns determine the well-formedness of PNCs (3) if the denotation of the noun occurs in a particular semantic field, while Baldwin et al. (2006) assume that certain prepositions impose selection restrictions on their nominal complements that allow for determiner omission (4). (3) from school, at school, in jail, from jail, ... (4) by train, by plane, by bus, by pogo stick, by hydro-foil ... Interestingly, Le Bruyn et al. (2009) have observed that basic assumptions of Stvan’s analysis do not apply to Dutch, French, or Norwegian. With regard to German, we observe that neither the pattern in (3) nor in (4) is productive. Constructions like (4) cannot be realized as PNCs in German, but require full PPs. In the following, we propose an analysis of PNCs that combines corpus annotation, annotation mining (Chiarcos et al., 2008), and logistic regression modeling (Harrell, 2001). Annotation mining assumes that linguistically relevant generalizations can be derived in a bottom-up fashion from a suitably annotated corpus. Relevant hits in the corpus are mapped into a feature vector that serves as input for logistic regression classification. In the present case, the input consists of sentences containing either PNCs or PPs. Binary logistic regression suggests itself as a classification method since the problem of PNCs can be rephrased as the following question: Under which conditions can an otherwise obligatory determiner be omitted? The majority of required annotations can be derived automatically, but there are no available systems for the automatic determination of preposition senses in German, so preposition sense annotation has to be carried out manually and requires a language-specific tagset for preposition senses. While our initial analysis is based on German data, the general methodology can be applied to other languages, provided that corpora receive proper annotation.

منابع مشابه

Antonymic Prepositions and Weak Referentiality (3.10)

Analyses that treat determiner omission in terms of weak referentiality have recently been adopted for determinerless PPs. A missing discourse referent is involved in both cases (Farkas and de Swart 2003, Espinal and McNally 2011, de Swart 2012). With regard to the German prepositions mit and ohne, we will show that the former accepts the determiner omission reluctantly, while determiner omissi...

متن کامل

A NEW APPROACH FOR PARAMETER ESTIMATION IN FUZZY LOGISTIC REGRESSION

Logistic regression analysis is used to model categorical dependent variable. It is usually used in social sciences and clinical research. Human thoughts and disease diagnosis in clinical research contain vagueness. This situation leads researchers to combine fuzzy set and statistical theories. Fuzzy logistic regression analysis is one of the outcomes of this combination and it is used in situa...

متن کامل

Acquisition and Accurate Use of English Articles by Persian Speakers

This study was conducted with the purpose of examining Persian speakers’ article acquisition and use with reference to Ionin, Ko and Wexler’s (2004) model, which is based on the prediction of Fluctuation Hypothesis (FH) that EFL learners of [-article] languages, like Persian, make erroneous article use in [+definite, -specific] and [-definite, +specific] contexts. From among the students of an ...

متن کامل

Acquisition and Accurate Use of English Articles by Persian Speakers

This study was conducted with the purpose of examining Persian speakers’ article acquisition and use with reference to Ionin, Ko and Wexler’s (2004) model, which is based on the prediction of Fluctuation Hypothesis (FH) that EFL learners of [-article] languages, like Persian, make erroneous article use in [+definite, -specific] and [-definite, +specific] contexts. From among the students of an ...

متن کامل

Comparison of ordinary logistic regression and robust logistic regression models in modeling of pre-diabetes risk factors

Background: Regarding the increased risk of developing type 2 diabetes in pre-diabetic people, identifying pre-diabetes and determining of its risk factors seems so necessary. In this study, it is aimed to compare ordinary logistic regression and robust logistic regression models in modeling pre-diabetes risk factors. Methods: This is a cross-sectional study and conducted on 6460 people, over ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

متن کامل
عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2010